Skip to content

Comments

Add HIP backend#135

Open
amd-asalykov wants to merge 8 commits intoScalingIntelligence:mainfrom
amd-asalykov:main
Open

Add HIP backend#135
amd-asalykov wants to merge 8 commits intoScalingIntelligence:mainfrom
amd-asalykov:main

Conversation

@amd-asalykov
Copy link

@amd-asalykov amd-asalykov commented Jan 22, 2026

How to install:

uv add torch --index pytorch=https://download.pytorch.org/whl/rocm7.1

Run on MI350X/MI355X:

uv run python scripts/generate_and_eval_single_sample.py gpu_arch=gfx950 backend=hip dataset_src=huggingface level=1 problem_id=22 server_type=google model_name=gemini/gemini-2.5-flash

Run on MI300X/MI325X:

uv run python scripts/generate_and_eval_single_sample.py gpu_arch=gfx942 backend=hip dataset_src=huggingface level=1 problem_id=22 server_type=google model_name=gemini/gemini-2.5-flash

@simonguozirui
Copy link
Collaborator

simonguozirui commented Feb 21, 2026

thanks so much @amd-asalykov, validating on a bare metal MI350X right now. Also thanks @laasya-konidala for setting things up and checking the codebase + verifying!

@salykova
Copy link

salykova commented Feb 21, 2026

@simonguozirui you might have noticed that in the current implementation we rely on os.environ["CXX"] = "hipcc" in src/kernelbench/prompts/model_new_ex_add_hip.py to make pytorch's load_inline work with hip kernels. We didn't explicitly tell LLMs to include this LOC in the generated kernels. Instead we expect that LLMs will include it automatically based on the model_new_ex_add_hip.py example. As an alternative, we could introduce a backend-specific prompt and explicitly ask LLMs to include os.environ["CXX"] = "hipcc"

@simonguozirui
Copy link
Collaborator

Gotcha os.environ["CXX"] = "hipcc" triggers the HIP compiler and prevents it from running hipify (we had this issue last year when @willhu-jpg and I tried to implement AMD support; it would just hipify CUDA code #37). We can keep that as part of the in-context example which becomes a part of the prompt automatically, the same way we treat other backends like TK that need special include library (no need for backend specific warnings and instructions)

Added a few guardrails to ensure a separate AMD and NVIDIA code path

Two more small things to close things off

  • reward hack checking -> hip kernel must use hip-related keywords (regex-based matching)
  • L2 cache thrashing, for clearing cache rn we allocate a big tensor to thrash it, so for AMD need to see if the tensor size still works well there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants